68 research outputs found
WarpNet: Weakly Supervised Matching for Single-view Reconstruction
We present an approach to matching images of objects in fine-grained datasets
without using part annotations, with an application to the challenging problem
of weakly supervised single-view reconstruction. This is in contrast to prior
works that require part annotations, since matching objects across class and
pose variations is challenging with appearance features alone. We overcome this
challenge through a novel deep learning architecture, WarpNet, that aligns an
object in one image with a different object in another. We exploit the
structure of the fine-grained dataset to create artificial data for training
this network in an unsupervised-discriminative learning approach. The output of
the network acts as a spatial prior that allows generalization at test time to
match real images across variations in appearance, viewpoint and articulation.
On the CUB-200-2011 dataset of bird categories, we improve the AP over an
appearance-only network by 13.6%. We further demonstrate that our WarpNet
matches, together with the structure of fine-grained datasets, allow
single-view reconstructions with quality comparable to using annotated point
correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels
Domain adaptation for semantic segmentation across datasets consisting of the
same categories has seen several recent successes. However, a more general
scenario is when the source and target datasets correspond to non-overlapping
label spaces. For example, categories in segmentation datasets change vastly
depending on the type of environment or application, yet share many valuable
semantic relations. Existing approaches based on feature alignment or
discrepancy minimization do not take such category shift into account. In this
work, we present Cluster-to-Adapt (C2A), a computationally efficient
clustering-based approach for domain adaptation across segmentation datasets
with completely different, but possibly related categories. We show that such a
clustering objective enforced in a transformed feature space serves to
automatically select categories across source and target domains that can be
aligned for improving the target performance, while preventing negative
transfer for unrelated categories. We demonstrate the effectiveness of our
approach through experiments on the challenging problem of outdoor to indoor
adaptation for semantic segmentation in few-shot as well as zero-shot settings,
with consistent improvements in performance over existing approaches and
baselines in all cases.Comment: Accepted to L3D workshop at CVPR 202
Learning random-walk label propagation for weakly-supervised semantic segmentation
Large-scale training for semantic segmentation is challenging due to the
expense of obtaining training data for this task relative to other vision
tasks. We propose a novel training approach to address this difficulty. Given
cheaply-obtained sparse image labelings, we propagate the sparse labels to
produce guessed dense labelings. A standard CNN-based segmentation network is
trained to mimic these labelings. The label-propagation process is defined via
random-walk hitting probabilities, which leads to a differentiable
parameterization with uncertainty estimates that are incorporated into our
loss. We show that by learning the label-propagator jointly with the
segmentation predictor, we are able to effectively learn semantic edges given
no direct edge supervision. Experiments also show that training a segmentation
network in this way outperforms the naive approach.Comment: This is a revised version of a paper presented at CVPR 2017 that
corrects some equations. See footnote
Tuned Contrastive Learning
In recent times, contrastive learning based loss functions have become
increasingly popular for visual self-supervised representation learning owing
to their state-of-the-art (SOTA) performance. Most of the modern contrastive
learning methods generalize only to one positive and multiple negatives per
anchor. A recent state-of-the-art, supervised contrastive (SupCon) loss,
extends self-supervised contrastive learning to supervised setting by
generalizing to multiple positives and negatives in a batch and improves upon
the cross-entropy loss. In this paper, we propose a novel contrastive loss
function -- Tuned Contrastive Learning (TCL) loss, that generalizes to multiple
positives and negatives in a batch and offers parameters to tune and improve
the gradient responses from hard positives and hard negatives. We provide
theoretical analysis of our loss function's gradient response and show
mathematically how it is better than that of SupCon loss. We empirically
compare our loss function with SupCon loss and cross-entropy loss in supervised
setting on multiple classification-task datasets to show its effectiveness. We
also show the stability of our loss function to a range of hyper-parameter
settings. Unlike SupCon loss which is only applied to supervised setting, we
show how to extend TCL to self-supervised setting and empirically compare it
with various SOTA self-supervised learning methods. Hence, we show that TCL
loss achieves performance on par with SOTA methods in both supervised and
self-supervised settings.Comment: Preprint Versio
Deep Network Flow for Multi-Object Tracking
Data association problems are an important component of many computer vision
applications, with multi-object tracking being one of the most prominent
examples. A typical approach to data association involves finding a graph
matching or network flow that minimizes a sum of pairwise association costs,
which are often either hand-crafted or learned as linear functions of fixed
features. In this work, we demonstrate that it is possible to learn features
for network-flow-based data association via backpropagation, by expressing the
optimum of a smoothed network flow problem as a differentiable function of the
pairwise association costs. We apply this approach to multi-object tracking
with a network flow formulation. Our experiments demonstrate that we are able
to successfully learn all cost functions for the association problem in an
end-to-end fashion, which outperform hand-crafted costs in all settings. The
integration and combination of various sources of inputs becomes easy and the
cost functions can be learned entirely from data, alleviating tedious
hand-designing of costs.Comment: Accepted to CVPR 201
- …